Techniques for Automatically Transcribing Unknown Keywords for Open Keyword Set Hmm-based Word-spotting
نویسنده
چکیده
Many word-spotting applications require an open keyword vocabulary, allowing the user to search for any term in an audio document database. In conjunction with this, an automatic method of determining the acoustic representation of an arbitrary keyword is needed. For a HMMbased system, where the keyword is represented by a concatenated string of phones, the keyword phone string (KPS), the phonetic transcription must be estimated. This report describes automatic transcription methods for orthographically spelt, spoken, and combined spelt and spoken, keyword input modes. The spoken keyword example case is examined in more detail for the following reasons. Firstly, interaction with an audio-based system is more natural than typing at a keyboard or speaking the orthographic spelling. This is of particular interest for hand-held devices with no, or a limited, keyboard. Secondly, there is likely to be a high occurrence of real names and user-de ned jargon in retrieval requests which are di cult to cover fully in spelling based systems. The basic approach considered is that of using a phone level speech recogniser to hypothesise one or more keyword transcriptions. The e ect on the KPSs of the number of pronunciation strings, the HMM complexity, and the language model used in the phone recogniser, and the number of sample keyword utterances is evaluated through a series of speaker dependent word-spotting experiments on spontaneous speech messages from the Video Mail Retrieval database. Overall it was found that speech derived KPSs are less robust than phonetic dictionary de ned KPSs. However, since the speech-based system does not use a dictionary it has the advantage that it can handle any word or sound. It also requires less memory. Given a single keyword utterance, producing multiple keyword pronunciations using a 7 N-best recogniser was found to give the best word-spotting performance, with a 9.3% drop in performance relative to the phonetic dictionary de ned system for a null grammar, monophone HMM-based KPS recogniser. If two utterances are available, greater robustness can be achieved as the problem of poor keyword examples is partially overcome. Again, a 7 N-best approach yielded the best performance (6.1% relative drop), but good performance was also achieved using the Viterbi string for each utterance (8.5% relative drop), which has a lower computational cost.
منابع مشابه
An Efficient Keyword Spotting Techni Language for Filler Mo
The task of keyword spotting is to detect a set of keywords in the input continuous speech. In a keyword spotter, not only the keywords, but also the non-keyword intervals must be modeled. For this purpose, filler (or garbage) models are used. To date, most of the keyword spotters have been based on hidden Markov models (HMM). More specifically, a set of HMM is used as garbage models. In this p...
متن کاملPerformance Improvement in Keyword Spotting for Telephony Services
In this paper, a new hybrid approach is presented for keyword spotting. The proposed Method is based on Hidden Markov Mode (HMM) and is performed in two stages. In the first stage by using phoneme models, a series of candidate keyword(s) is recognized. In the second stage, word models are used to decide on acceptance or rejection of each candidate keyword. Two different methods are presented in...
متن کاملA hybrid HMM/DNN approach to keyword spotting of short words
An HMM/DNN framework is proposed to address the issues of short-word detection. The first-stage keyword hypothesizer is redesigned with a context-aware keyword model and a 9state filler model to reduce the miss rate from 80% to 6% and increase the figure-of-merit (FOM) from 6.08% to 21.88% for short words. The hypothesizer is followed by a MLP-based second-stage keyword verifier to further redu...
متن کاملSpanish Keyword Spotting System Based on Filler Models, Pseudo N-gram Language Model and a Confidence Measure
In order to organize efficiently lots of hours of audio contents such as meetings, radio news, search for spoken keywords is essential. An approach uses filler models to account for non-keyword intervals. Another approach uses a large vocabulary continuous speech recognition system (LVCSR) which retrieves a word string and then search for the keywords in this string. This approach yields high p...
متن کاملLexical Access-based Confidence Measure for a Spanish Keyword Spotting System
Keyword spotting deals with the search of a reduced set of keywords in audio content. Phone Lattice-based approaches are very fast but achieve poor results. HMM-based keyword spotting systems deal with filler models to absorb the Out-of-vocabulary (OOV) words and achieve best results although they are slower. We propose a technique which combines them in order to perform a confidence measure to...
متن کامل